## [1] "Loading the following libraries using lb_myRequiredPackages: data.table"
## [2] "Loading the following libraries using lb_myRequiredPackages: lubridate"
## [3] "Loading the following libraries using lb_myRequiredPackages: ggplot2"
## [4] "Loading the following libraries using lb_myRequiredPackages: readr"
## [5] "Loading the following libraries using lb_myRequiredPackages: plotly"
## [6] "Loading the following libraries using lb_myRequiredPackages: knitr"
To extract and visualise tweets and re-tweets of #dockercon for 17 - 21 April, 2017 (DockerCon17).
Borrowing extensively from http://thinktostart.com/twitter-authentification-with-r/
Data should have been already downloaded using collectData.R. This produces a data table with the following variables (after some processing):
## [1] "text" "favorited" "favoriteCount"
## [4] "replyToSN" "created" "truncated"
## [7] "replyToSID" "id" "replyToUID"
## [10] "statusSource" "screenName" "retweetCount"
## [13] "isRetweet" "retweeted" "longitude"
## [16] "latitude" "location" "language"
## [19] "profileImageURL" "createdLocal" "obsDateTimeMins"
## [22] "obsDateTimeHours" "obsDateTime5m" "obsDateTime10m"
## [25] "obsDateTime15m" "obsDate" "isRetweetLab"
The table has 7,975 tweets (and 11,285 re-tweets) from 6,429 tweeters between 2017-04-16 19:01:03 and 2017-04-21 09:54:11 (Central District Time).
All (re)tweets containing #dockercon 2017-04-17 to 2017-04-21
This plot is zoomable - try it!
All (re)tweets containing #dockercon Monday 17th April 2017
This plot is zoomable - try it!
All (re)tweets containing #dockercon Tuesday 18th April 2017
This plot is zoomable - try it!
All (re)tweets containing #dockercon Wednesday 19th April 2017
All (re)tweets containing #dockercon Thursday 20th April 2017
We wanted to make a nice map but sadly we see that most tweets have no lat/long set.
| latitude | longitude | nTweets |
|---|---|---|
| NA | NA | 19205 |
| 30.26416397 | -97.73961067 | 2 |
| 30.26857 | -97.73617 | 1 |
| 30.2625 | -97.7401 | 31 |
| 30.26470908 | -97.7417368 | 1 |
| 30.20226566 | -97.66722505 | 1 |
| 42.36488267 | -71.02168356 | 1 |
| 37.61697678 | -122.38427689 | 1 |
| 30.2672 | -97.7639 | 3 |
| 30.2635554 | -97.7399303 | 1 |
| 30.2591 | -97.7384 | 1 |
| 30.26622515 | -97.74327721 | 1 |
| 30.26037 | -97.73848 | 3 |
| 30.258201 | -97.71264 | 1 |
| 30.25888 | -97.73841 | 2 |
| 30.259714 | -97.73940054 | 1 |
| 30.26006 | -97.73813 | 1 |
| 30.26006 | -97.73859 | 1 |
| 30.26036009 | -97.73848483 | 1 |
| 30.20243954 | -97.66718069 | 1 |
This appears to be pulled from the user’s profile although it may also be a ‘guestimate’ of current location.
Top locations for tweets:
| location | nTweets |
|---|---|
| NA | 2968 |
| San Francisco, CA | 1333 |
| San Francisco | 539 |
| Austin, TX | 344 |
| Seattle, WA | 245 |
| Silicon Valley, CA | 228 |
| Paris | 191 |
| Islamabad, Pakistan | 149 |
| London | 142 |
| New York, NY | 129 |
| Charlotte, NC | 121 |
| San Jose, CA | 121 |
| Boston, MA | 112 |
| USA | 108 |
| Boulder, CO | 107 |
Top locations for tweeters:
| location | nTweeters |
|---|---|
| NA | 1195 |
| San Francisco, CA | 182 |
| Austin, TX | 89 |
| San Francisco | 61 |
| Seattle, WA | 52 |
| New York, NY | 45 |
| Paris | 45 |
| San Jose, CA | 42 |
| London, England | 37 |
| Paris, France | 36 |
| London | 33 |
| Palo Alto, CA | 31 |
| New York | 29 |
| France | 29 |
| Boston, MA | 28 |
Next we’ll try by screen name.
Top tweeters:
| screenName | nTweets |
|---|---|
| DockerCon | 335 |
| theCUBE | 187 |
| BettyJunod | 148 |
| jpetazzo | 129 |
| climbingkujira | 127 |
| solomonstre | 126 |
| jeanepaul | 104 |
| ManoMarks | 99 |
| kaslinfields | 94 |
| OpenShiftNinja | 89 |
| vmblog | 86 |
| sitspak | 85 |
| SFoskett | 82 |
| jameskobielus | 77 |
| stefscherer | 75 |
And here’s a really bad visualisation of all of them tweeting over time! Each row of pixels is a tweeter (the names are illegible) and a green dot indicates a few tweets in the 5 minute period while a red dot indicates a lot of tweets.
N tweets per 5 minutes by screen name
So let’s re-do that for the top 50 tweeters so we can see their tweetStreaks!
N tweets per 5 minutes by screen name (top 50, most prolific tweeters at bottom)
Analysis completed in: 58.75 seconds using knitr in RStudio with R version 3.3.3 (2017-03-06) running on x86_64-apple-darwin13.4.0.
A special mention must go to twitteR (Gentry, n.d.) for the twitter API interaction functions and lubridate (Grolemund and Wickham 2011) which allows timezone manipulation without too many tears.
Other R packages used:
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Gentry, Jeff. n.d. TwitteR: R Based Twitter Client. http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.